# Replication Package

## Why Biden-Era Clean Energy Investment Policies Had Limited Political Returns

**Authors:** Alexander F. Gazmararian  
**Contact:** agazmararian@gmail.com  
**Generated:** 2026-02-22

---

## Overview

This replication package contains the code and data necessary to reproduce
all tables, figures, and statistics reported in the paper.

## System Requirements

- **R Version:** R version 4.4.3 (2025-02-28) (or compatible)
- **RAM:** 5 GB minimum, 8 GB recommended
  - Peak usage observed: ~3.2 GB
  - Average usage: ~1.8 GB
- **Disk Space:** ~8 GB for data and outputs (more on Windows when using parallel processing)
- **Runtime:** ~22 minutes (longer on first run when renv::restore() installs packages)

Tested on macOS (Apple Silicon, aarch64-apple-darwin20, Darwin 23.6.0) and Windows Server x64 (AWS, x86_64-w64-mingw32).

### Package Management

This project uses `renv` for reproducible package management.
When you run the replication (Options A–C below), `renv::restore()` runs
automatically so required packages are installed before the pipeline runs.

## Directory Structure

```
replication_package/
├── README.md                    # This file
├── run_replication.R           # Entry point script
├── renv.lock                    # Package versions
├── renv/                        # Package management
├── analysis/
│   ├── pnas_script_runner.R    # Main analysis runner
│   ├── statements/             # Statement analysis scripts
│   └── visibility/             # Visibility analysis scripts
├── R/                           # Helper functions
├── data/
│   ├── input/                  # Input data files
│   ├── cache/                  # All cached data
│   │   ├── annotations/        # GPT statement annotations
│   │   ├── geocoding/          # Survey ZIP geocoding
│   │   └── *.csv/*.rds         # Processed data derivatives
│   ├── inter/                  # [Generated] Intermediate files
│   └── output/                 # [Generated] Analysis outputs
└── output/
    └── pnas/
        ├── tables/             # [Generated] LaTeX tables
        ├── figures/            # [Generated] PDF figures
        └── stats/              # [Generated] Summary statistics
```

## Instructions

### Option A: Command Line (Recommended)

```bash
cd /path/to/replication_package
Rscript run_replication.R
```

### Option B: RStudio

1. Open RStudio and set working directory to the package folder
2. Run: `source("run_replication.R")`

### Option C: Interactive R Session

```r
setwd("/path/to/replication_package")
source("run_replication.R")
```

This will execute all analysis scripts in order and generate:
- Tables in `output/pnas/tables/`
- Figures in `output/pnas/figures/`
- Statistics in `output/pnas/stats/`

### Verify Outputs

Compare MD5 checksums of generated files to the reference values in
`output/pnas/log/output_checksums.csv` (e.g. using a checksum utility or
script that reads the CSV and verifies each path).

## Data Sources

### Included Data

| Source | Description | Location |
|--------|-------------|----------|
| Qualtrics Surveys | Survey responses | `data/input/qualtrics_*/` |
| EIA-860M | Electric generating units | `data/input/EIA-860M/` |
| Census ACS | Demographic covariates (cached) | `data/cache/` |
| GPT Annotations | Statement classifications (cached) | `data/cache/annotations/` |

In the anonymized package, raw survey (and other PII) files are omitted; replication uses cached data in `data/cache/` as described below.

### Cached Data (No Downloads Required)

The following data is pre-cached to avoid requiring API keys or internet access:

| Data | Original Source | Cached File |
|------|-----------------|-------------|
| Census ACS 2023 | Census API (requires key) | `data/cache/acs2023_raw_download.rds` |
| ACS Survey Weights | Census API (requires key) | `data/cache/acs_weights_raw_2023.rds` |
| ACS Variable Definitions | Census API (requires key) | `data/cache/acs_variables_2023.rds` |
| DMA Assignments | Computed from coordinates | `data/cache/dma_assignments.csv` |
| Distance Calculations | Computed from coordinates | `data/cache/respondent_distance2project_processed.rds` |
| Conley Standard Errors | Computed from coordinates | `data/cache/vcov_conley_cache.rds` |
| Highway Counties | Tigris (downloads from Census) | `data/cache/tigris_counties_2018.rds` |
| Alaska Counties | Tigris (downloads from Census) | `data/cache/tigris_ak_counties_2020.rds` |

### Restricted Data

The following data sources require separate access:

| Source | Access | Notes |
|--------|--------|-------|
| Big Green Machine Dataset | [Contact authors](https://sites.google.com/view/biggreenmanufacturing) | Processed: `data/cache/turner_processed.csv` |

## Computational Notes

### GPT-Based Annotations (Important)

The statements analysis uses OpenAI's GPT API to classify political statements.
**GPT outputs are inherently non-deterministic**, even with `temperature = 0`.

**For exact reproducibility:**

- Pre-computed annotations are stored in `data/cache/annotations/` and will be
  used automatically when running the pipeline.
- **Do not** set `new_annotation <- TRUE` in `R/annotation/annotate.R`.
- Re-running the GPT API would produce similar but not identical classifications.
- Statistical conclusions are robust to annotation variation (validated via
  robustness checks with alternative codebooks), but exact numerical outputs
  will differ if annotations are regenerated.

### Random Seeds

All stochastic processes use fixed random seeds for reproducibility:

- Network visualization layout: `set.seed(42)` in `analyze.R`
- Power analysis simulations: `set.seed(10)` in `power_analysis.R`
- Bootstrap confidence intervals: `set.seed(42)` in `check_annotation_quality.R`

### DSL Bias-Adjusted Analyses (Cross-Platform Differences)

Tables S36-S38 use the `dsl` package (Egami et al. 2024) and may show small
numerical differences across platforms due to stochastic sample splitting and
cross-fitting; conclusions are unchanged. See [Egami et al. (2024)](https://naokiegami.com/paper/dsl_ss.pdf).

## Output Verification

Expected outputs: 38 tables, 22 figures, 53 stat files

Reference checksums for verification are in `output/pnas/log/output_checksums.csv`.

## Troubleshooting

### Package Installation Issues

If `renv::restore()` fails:
```r
renv::repair()  # Fix broken links
renv::restore() # Try again
```

### Windows: Rtools Required for Some Packages

This replication package is configured to prefer pre-compiled binary packages.
However, if you see errors mentioning `make not found` or compilation failures,
you need to install **Rtools**:

1. Download from: https://cran.r-project.org/bin/windows/Rtools/
2. Run the installer (use default settings)
3. Restart R/RStudio and try again

### Windows: R Installation

1. Download R from: https://cran.r-project.org/bin/windows/base/
2. Install with default settings
3. Add R to PATH or use full path:
   ```powershell
   $env:Path += ";C:\Program Files\R\R-4.4.3\bin\x64"
   Rscript run_replication.R
   ```

### Windows: Parallel Processing

On Windows, parallel processing uses PSOCK clusters (8 workers by default).
This adds ~1.2-2 GB RAM overhead but provides ~3x speedup for distance calculations.

To adjust if you have memory constraints:
```r
# Reduce to 2 workers (~600 MB overhead)
options(parallel.windows.cores = 2)

# Or disable parallel processing entirely
options(parallel.windows.mode = "sequential")
```

Add these lines to the top of `run_replication.R` before sourcing other scripts.

### File Path Issues

All paths use `here()` for cross-platform compatibility.
The `run_replication.R` script automatically sets up the correct working directory.
If running interactively, ensure you `setwd()` to the package root before sourcing scripts.

## License

This replication package is licensed under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/).
You may share and adapt these materials with appropriate attribution.

## Citation

```bibtex
@article{gazmararian2026,
  author = {Gazmararian, Alexander F. and Jensen, Nathan and Tingley, Dustin},
  title = {Why Biden-Era Clean Energy Investment Policies Had Limited Political Returns},
  journal = {Proceedings of the National Academy of Sciences},
  year = {2026},
  doi = {10.1073/pnas.2526802123}
}
```

